Electronic material for:

Modeling and Rendering Architecture from Photographs:
A Hybrid Geometry- and Image-Based Approach

          http://www.cs.berkeley.edu/~debevec/Research/

Paul E. Debevec      debevec@cs.berkeley.edu
                     http://www.cs.berkeley.edu/~debevec/
Camillo J. Taylor    camillo@cs.berkeley.edu
                     http://HTTP.CS.Berkeley.EDU/~camillo/
Jitendra Malik       malik@cs.berkeley.edu
                     http://HTTP.CS.Berkeley.EDU/~malik/

Computer Vision Group
          http://http.cs.berkeley.edu/projects/vision/vision_group.html
Computer Science Division
          http://www.cs.berkeley.edu/
University of California at Berkeley
          http://www.berkeley.edu/

========== TIFF Images

Here are electronic originals of the figures that we used in our
paper.  The numbering is the same, except for fig07b.tif which did not
appear in the paper due to space limitations.  More information, our
latest results, and an expanded version of the paper are available
online at: http://www.cs.berkeley.edu/~debevec/Research/

fig01.tif   Schematic comparison of geometry-based and image-based
            modeling/rendering systems, and our hybrid approach.

fig02ab.tif Image viewer showing marked features and model viewer
            showing recovered model images from the photogrammetric
            modeling system.  This model was recovered from just the
            one photograph, which was made possible by embedding
            constraints of symmetry into the model.  The tower is
            the Campanile at the Univeristy of California at Berkeley.

fig02cd.tif Reprojected model edges, showing the accuracy of the
            recovered model (only edges belonging to front-facing
            faces are shown.)  (d) A novel view of the clock tower
            generated from three images and view-dependent texture-
            mapping.  The virtual camera position is 250 feet above
            the ground.

fig07.tif   Three of twelve images used to reconstruct a high school
            building (University High School in Urbana, IL), with
            marked features shown in green.  The original images used
            were 768 x 512 pixels.

fig07b.tif  The edges of the recovered model, reprojected through the
            corresponding recovered camera positions and overlaid on the
            same three images.  The fact that the blue reprojected edges
            conform correctly to the original photographs indicates that
            the building has been reconstructed accurately.  Only edges
            belonging to front-facing faces are shown.

fig08.tif   Three views of the recovered high school model, rendered
            as flat-shaded polygons.  The twelve recovered camera positions
            are all visible in the bottom picture.

fig09.tif   A novel view of the high school building (from about 25 feet
            above the ground) rendered with the view-dependent texture-
            mapping method.  Some artifacts due to uneven exposure in the
            images can be seen toward the right of the image.  Some trees
            were masked out of the original images to produce this
            rendering.

fig10abc.tif
            A reconstruction of Hoover Tower in Palo Alto, California.
            As in fig02, this reconstruction is also made from a single
            photograph.  The first image shows the original photograph,
            with approximately 50 user-marked edges.  The second image
            shows the recovered model (since the top of the tower was not
            visible in the photograph, its height had to be guessed at.)
            The last image shows the results of projecting the first image
            onto the recovered model.  The blue regions indicate areas
            that could not been seen in the original photograph.

fig11.tif   The process of view-dependent texture mapping.  The top two
            images show projecting two individual images onto the building.
            The bottom left image shows how both projections can be
            composited using our view-dependent weighting function.
            The final image shows the results of compositing all twelve
            images using view-dependent texture-mapping.

fig13.tif   The benefit of view-dependent texture mapping.  (a) A detail
            view of the high school model.  (b) A rendering of the model
            from the same position using view-dependent texture mapping.
            Note that although the model does not capture the slightly
            recessed windows, the windows appear properly recessed because
            the texture map is sampled primarily from a photograph which
            viewed the windows from approximately the same direction.
            (c) The same piece of the model viewed from a different angle,
            using the same texture map as in (b).  Since the texture is
            not selected from an image that viewed the model from
            approximately the same angle, the recessed windows appear
            unnatural.  (d) A more natural result obtained by using
            view-dependent texture mapping.  Since the angle of view in
            (d) is different than in (b), a different composition of
            original images is used to texture-map the model.

fig14a.tif  Key, Warped-Offset, and Offset images used in model-based
fig14b.tif  stereo algorithm.  The key and offset images are original
fig14c.tif  pictures of the entrance to Peterhouse chapel at Cambridge
            University.  The warped offset image was created by projecting
            the offset image onto a very basic model (two quadrilaterals)
            of the entrace, and then reprojecting into the key camera
            position.  As a result, the structure of the scene is
            relatively easy to recover by comparing the key and
            warped offset images, rather than directly comparing the
            key and offset images.
            
fig14d.tif  A disparity map computed by model-based stereo algorithm.
            The brightness values are a function of the distance
            between the computed depth of the actual scene and the
            depth predicted by the approximate model.  This disparity
            map can then be used to produce a depth map for the key
            image.

fig16a.tif  Rendered views of recovered chapel facade model, which are
fig16b.tif  full-size images of frames 68, 0, and 290 of movie6.mov
fig16c.tif  A depth map for each of four key images was recovered using
            model-based stereo.  For each rendering, all four images were
            warped to the desired viewpoint using image-based rendering
            techniques.  Lastly, the four warped images were composited
            using view-dependent texture-mapping to produce the final
            rendering.

========== QuickTime Movies

movie1.mov  Four images projected onto recovered high school model.
            A shadow buffer algorithm is used to compute which parts
            of the model are visible from the original camera
            positions.

movie2.mov  All twelve images projected onto the high school model,
            composited with view-dependent texture-mapping.  Some
            trees and signs can be seen incorrectly projected onto the
            surface of the building.

movie3.mov  Same as movie2.mov, with obstructions (signs, trees) masked
            out of the original images.

movie4.mov  Fly-around of the chapel facade renderend with traditional
            texture-mapping.  The facade appears flat.

movie5.mov  Fly-around of the chapel facade rendered with view-dependent
            texture-mapping of four images.  No model-based stereo detail
            recovery has been performed.  Since the model is such a rough
            approximation to the model's surface, view-dependent
            texture-mapping produces an undesirable amount of blurring.

movie6.mov  Fly-around of chapel facade with geometric detail recovered
            from model-based stereo and composited with view-dependent
            texture-mapping using the same four images.  Since the original
            images are warped according to the scene's recovered structure,
            rather than the approximate structure of the model, the
            composited renderings are more realistic.